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A NON-PARAMETRIC BAYESIAN APPROACH TO 
DECOMPOUNDING FROM HIGH FREQUENGY DATA 

SHOTA GUGUSHVILI, FRANK VAN DER MEULEN, AND PETER SPREIJ 


Abstract. Given a sample from a discretely observed compound Poisson pro¬ 
cess, we consider non-parametric estimation of the density /o of its jump sizes, 
as well as of its intensity Aq. We take a Bayesian approach to the problem and 
specify the prior on /o as the Dirichlet location mixture of normal densities. 
An independent prior for Aq is assumed to be compactly supported and to pos¬ 
sess a positive density with respect to the Lebesgue measure. We show that 
under suitable assumptions the posterior contracts around the pair (Aq, /o) at 
essentially (up to a logarithmic factor) the -s/nA-rate, where n is the number 
of observations and A is the mesh size at which the process is sampled. The 
emphasis is on high frequency data, A ^ 0, but the obtained results are also 
valid for fixed A. In either case we assume that nA oo. Our main result 
implies existence of Bayesian point estimates converging (in the frequentist 
sense, in probability) to (Ao,/o) at the same rate. 

We also discuss a practical implementation of our approach. The com¬ 
putational problem is dealt with by inclusion of auxiliary variables and we 
develop a Markov Chain Monte Carlo algorithm that samples from the joint 
distribution of the unknown parameters in the mixture density and the intro¬ 
duced auxiliary variables. Numerical examples illustrate the feasibility of this 
approach. 


1. Introduction 


1.1. Problem formulation and announcement of the main result. Let N = 

(-Vt, t > 0) be a Poisson process with a constant intensity A > 0 and let Yi, Y 2 ) V 3 ... 
be a sequence of independent random variables independent of N and having a 
common distribution function F with density / (with respect to the Lebesgue 
measure). A compound Poisson process (abbreviated CPP) X = {Xt, t > 0) is 
defined as 



( 1 ) 


where the sum over an empty set is by definition equal to zero. CPPs form a 
basic model in a variety of applied fields, most notably in e.g. queueing and risk 
theory, see Embrechts et al. (1997) and Prabhu (1998) and the references therein, 
but also in other fields of science, see e.g. Alexandersson (1985), Burlando and 
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Rosso (1993) for stochastic models for precipitation, Katz (2002) on modelling of 
hurricane damage, or Scalas (2006) for applications in economics and finance. 

Suppose that corresponding to the ‘true’ parameter values A = Aq and / = /o, 
a discrete time sample Xa, ^ 2 A, ■ ■ ■, is available from (1), where A > 0. 
Such a discrete time observation scheme is common in a number of applications of 
CPP, e.g. in the precipitation models of the above references. Based on the sample 
= (Aa, -^ 2 A: ■ ■ ■, A„a)i we are interested in (non-parametric) estimation of Aq 
and fg. Before proceeding further, we notice that by the stationary independent 
increments property of a compound Poisson process, the random variables Zf" = 
XiA — A(i_i)A, !<*<«,, are independent and identically distributed. Each Zf" 
has the same distribution as the random variable 



( 2 ) 


where is independent of the sequence Yi,Y 2 ,... and has a Poisson distribu¬ 
tion with parameter AA. Hence, our problem is equivalent to estimating (non- 
parametrically) Aq and fg based on the sample = {Z^, Z ^,..., Z^). We will 
henceforth use this alternative formulation of the problem. Our emphasis is on high 
frequency data, A = A„ —0 as n —>■ oo, but the obtained results are also valid for 
low frequency observations, i.e. for fixed A. 

Our main result is on the contraction rate of the posterior distribution, which 
we show to be, up to a logarithmic factor, (nA)“^/^. A by now standard approach 
to obtain contraction rates in an IID setting is to verify the assumptions of the 
fundamental Theorem 2.1 in Ghosal et al. (2000). It should be noted that in the 
present high frequency setting, this theorem is not applicable. One of the model 
assumptions underlying this theorem, which is satisfied in Gugushvili et al. (2015), 
is that one deals with samples of a fixed distribution, whereas in our present high 
frequency observation regime the distribution of Z'^ is varying, with the Dirac 
distribution concentrated at zero as its limit for A —0. Therefore we propose 
an alternative approach, circumventing the use of the cited Theorem 2.1. The 
theoretical contribution of the present paper is therefore not only the statement of 
the main result itself, but also its proof. Next to this we also discuss a practical 
implementation of our non-parametric Bayesian approach, a Markov Chain Monte 
Carlo algorithm that samples from the joint distribution of the unknown parameters 
in the mixture density and certain introduced auxiliary variables. 

1.2. Literature review and present approach. Because adding a Poisson num¬ 
ber of Yj ’s amounts to compounding their distributions, the problem of recovering 
the intensity Aq and the density fg from the observations Zfs can be referred to 
as decompounding. Decompounding already has some history: the early contribu¬ 
tions Buchmann and Griibel (2003) and Buchmann and Griibel (2004) dealt with 
estimation of the distribution function Fg, paying particular attention to the case 
when Fg is discrete, while the later contributions Comte et al. (2014), Duval (2013) 
and van Es et al. (2007) concentrated on estimation of the density fg instead. 
More (frequentist) theory on statistical inference on compound Poisson processes 
(and more generally on Levy processes) can be found in the volume Levy Mat¬ 
ters IV (2015), with the survey paper Comte and Genon-Catalot (2015) devoted to 
statistical methods for high frequency discrete observations, with a special section 
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on compound Poisson processes. Other references on statistics for Levy processes 
in the high frequency data setting are Comte and Genon-Catalot (2011), Comte 
and Genon-Catalot (2010a), Comte and Genon-Catalot (2010b), Figueroa-Lopez 
(2008), Figueroa-Lopez (2009), and Ueltzhofer and Kliippelberg (2011). All these 
approaches are frequentist in nature. On the other hand, theoretical and compu¬ 
tational advances made over the recent years have shown that a non-parametric 
Bayesian approach is feasible in various statistical settings; see e.g. Hjort et al. 
(2010) for an overview. This is the approach we will take in this work to estimate 
Ao and /q. 

To the best of our knowledge, non-parametric Bayesian approach to inference for 
(a class of) Levy processes was first considered in Gugushvili et al. (2015). That 
paper, contrary to the present context, dealt with observations at fixed equidistant 
times, and was strongly based on an application of Theorem 2.1 of Ghosal et al. 
(2000), as already alluded to in the Problem formulation of Section 1.1. The present 
work complements the results from Gugushvili et al. (2015), in the sense that we 
now allow high frequency observations, which requires a substantially different route 
to prove our results, as we will explain in more detail in Section 1.3. 

We will study the non-parametric Bayesian approach to decompounding from 
a frequentist point of view (in the sense specified below), so that one may also 
think of it as a means for obtaining a frequentist estimator. Advantages of the non- 
parametric Bayesian approach include automatic quantification of uncertainty in 
parameter estimates through Bayesian posterior credible sets and automatic selec¬ 
tion of the degree of smoothing required in non-parametric inferential procedures. 

1.3. Results. The non-parametric class J- of densities / that we consider is that 
of location mixtures of normal densities. So we consider densities specified by 



( 3 ) 


where (j)a denotes the density of the normal distribution with mean zero and vari¬ 
ance cr^ and H IS a mixing measure. These mixtures form a rich and flexible class 
of densities, see Marron and Wand (1992) and McLachlan and Peel (2000), that 
are capable of closely approximating many densities that themselves are not rep¬ 
resentable in this way. The resulting mixture densities will be infinitely smooth, 
which is arguably the case in many, if not most, practical applications. 

Bayesian estimation requires specification of prior distributions on A and /. We 
propose independent priors on A and / that we denote by Hi and 112, respectively. 
For /, we take a Dirichlet mixture of normal densities as a prior. This type of prior 
in the context of Bayesian density estimation has been introduced in Ferguson 
(1983) and Lo (1984); for recent references see e.g. Ghosal and van der Vaart 
(2001). The prior for / is defined as the law of the function fH,cr as in (3), with 
H assumed to follow a Dirichlet process prior with base measure a and cr a- 
priori independent with distribution IIs. Recall that a Dirichlet process on M 
with the base measure a defined on the Borel cr-algebra S(K) (we assume a to 
be non-negative and cr-additive) is a random probability measure G on K, such 
that for every finite and measurable partition Ri, i? 2 , ■ • ■, Bk of K, the probability 
vector {G{Bi),G{B 2 ), ■ ■ ■ ,G{Bk)) possesses the Dirichlet distribution on the k- 
dimensional simplex with parameters {a{Bi), a{B 2 ), ■ ■ ■ a^Bj,)). See e.g. the original 
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paper Ferguson (1973), or the overview article Ghosal (2010) for more information 
on Dirichlet process priors. 

A nonparametric Bayesian approach to density estimation employing a Dirichlet 
mixture of normal densities as a prior can in very rough sense be thought of as 
a Bayesian counterpart of kernel density estimation (with a Gaussian kernel), cf. 
Ghosal and van der Vaart (2007), p. 697. 

With the sample size n tending to infinity, the Bayesian approach should be 
able to discern the true parameter pair (Ao,/o) with increasing accuracy. We can 
formalise this by requiring, for instance, that for any fixed neighbourhood A (in 
an appropriate topology) of (Ao,/o), n(A‘^|Z^) —>• 0 in Q^j'^^-probability. Here 
n is used as a shorthand notation for the posterior distribution of (A, /) and we 
use to denote the law of the random variable in (2) and the 

law of . More generally, one may take a sequence of shrinking neighbourhoods 
An of (Ao,/o) and try to determine the rate at which the neighbourhoods An are 
allowed to shrink, while still capturing most of the posterior mass. This rate is 
referred to as a posterior convergence rate (we will give the precise definition in 
Section 3). Two fundamental references dealing with establishing it in various 
statistical settings are Ghosal et al. (2000) and Ghosal and van der Vaart (2001). 
This convergence rate can be thought of as an analogue of the convergence rate 
of a frequentist estimator. The analogy can be made precise: contraction of the 
posterior distribution at a certain rate implies existence of a Bayes point estimate 
with the same convergence rate (in the frequentist sense); see Theorem 2.5 in Ghosal 
et al. (2000) and the discussion on pp. 506-507 there. 

Obviously, for our programme to be successful, A has to satisfy the assumption 
nA —)■ 00 , which is a necessary condition for consistent estimation of (Aq, /o), as it 
ensures that asymptotically we observe an infinite number of jumps in the process. 
We cover both the case of so called high frequency observation schemes (A —>■ 0) as 
well as low frequency observations (fixed A). A sufficient condition, which covers 
both observation regimes and which relates A to n, is A = n~°‘, where 0 < a < 1. 

We note that in Ghosal and Tang (2006) and Tang and Ghosal (2007) non¬ 
parametric Bayesian inference for Markov processes is studied, of which compound 
Poisson processes form a particular class, but these papers deal with estimation of 
the transition density of a discretely observed Markov process, which is different 
from the problem we consider here. A parametric Bayesian approach to inference 
for compound Poisson processes is studied in Insua et al. (2012), Sections 5.5 and 
10.3. 

The main result of our paper is Theorem 1, in which we state sufficient conditions 
on the prior that yield a posterior rate of contraction of the order Q.og'^ (nA))/\/nA, 
for some constant k > 0. We argue that this rate is a nearly (up to a logarithmic 
factor) optimal posterior contraction rate in our problem. Our main result comple¬ 
ments the one in Gugushvili et al. (2015), in that it treats both the low and high 
frequency observation schemes simultaneously, with emphasis on the latter. We 
note (again) a fundamental difference between the present paper and Gugushvili 
et al. (2015), when it comes down to the techniques to prove the main result. 
As Theorem 2.1 of Ghosal et al. (2000) cannot immediately be used, we take an 
alternative tour that avoids this theorem, but instead refines a number of techni¬ 
cal results involving properties of statistical tests that form essential ingredients 
of the proof in Ghosal et al. (2000). These refined results are then used as key 
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technical steps in a direct proof of our Theorem 1. Furthermore, it establishes 
the posterior contraction rate for infinitely smooth jump size densities /o, which 
is not covered by Gugushvili et al. (2015). On the other hand, Gugushvili et al. 
(2015) deals with multi-dimensional GPPs, while in this paper we consider only 
the one-dimensional case. Finally, in this work we also discuss a practical imple¬ 
mentation of our non-parametric Bayesian approach. The computational problem 
is dealt with by inclusion of auxiliary variables. More precisely, we show how a 
Markov Chain Monte Carlo algorithm can be devised that samples from the joint 
distribution of the unknown parameters in the mixture density and the introduced 
auxiliary variables. Numerical examples illustrate the feasibility of this approach. 

1.4. Organisation. The remainder of the paper is organised as follows. In the 
next section we state some preliminaries on the likelihood, prior and notation. In 
Section 3 we first motivate the use of the scaled Bellinger metric to define neigh¬ 
bourhoods for which posterior contraction rate is derived in case the observations 
are sampled at high frequency. Then we present the main result on the posterior 
contraction rate (Theorem 1), whose proof is given in Section 5. We discuss the 
numerical implementation of our results in Section 4. Technical lemmas and their 
proofs used to prove the main theorem are gathered in the Appendix. 


2. Preliminaries and notation 


2.1. Likelihood, prior and posterior. We are interested in Bayesian inference 
with Bayes’ formula. Therefore we need to specify the likelihood in our model. We 
use the following notation: 


/ 

2 a./ 

2t; 


law of Yi (law of the jumps of the GPP) 

law of (law of the increments of the discretely observed GPP) 
law of (joint law of the increments of the discretely observed GPP) 
law of {Xt, t € [0, A]) (law of the GPP on [0, A]) 


The characteristic function of the Poisson sum defined in (2) is given by 


(j){t) = 








where 0/ is the characteristic function of /. This can be rewritten as 

1 

g^A _ 

which, using the fact that 4>f vanishes at infinity, shows that the distribution of 
is a mixture of a point mass at zero and an absolutely continuous distribution. 
Letting t —/ oo, we get that (j){t) —>■ e~^^. Hence A is identifiable from the law of 
Z^, and then so is /. The density of the law j of Z^ with respect to the measure 
/i, which is the sum of Lebesgue measure and the Dirac measure concentrated at 
zero, can in fact be written explicitely as (cf. p. 681 in van Es et al. (2007) and 
Proposition 2.1 in Duval (2013)) 


( 4 ) 


lO^ °° 

-^{x) = e"^^l{o}(a::) + (1 - ^ a,„(AA)/*'"(ai)lH\{o}(a;), 

^ m—1 


where \a denotes the indicator of a set A, 


( 5 ) 


am(AA) — 


1 


(AA)^ 


AA 


- 1 
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and /*"* denotes the m-fold convolution of / with itself. However, the expression 
(4) is useless for Bayesian computations. To work around this problem, we will 
employ a different dominating measure. Consider the law of {Xt,t G [0, A]). 
By the Theorem on p. 261 in Skorohod (1964), is absolutely continuous with 
respect to R~ ~ if and only if P/ is absolutely continuous with respect to Pr (we of 

A,/ ^ / 

course assume that A, A > 0). A simple condition to ensure the latter is to assume 
that / is continuous and does not take the value zero on R. 

Define the random measure /i by 


^(H) = {#t : (t, Xt -Xt-)G B}, B G B{[Q, A]) 0 S(R \ {0}). 


Under Ra./, the random measure ^ is a Poisson point process on [0, A] x (R \ {0}) 
with intensity measure A(dt, dx) = Xdtf(x)dx, which follows e.g. from Theorem 1 
on p. 69 and Corollary on p. 64 in Skorohod (1964). By formula (46.1) on p. 262 
in Skorohod (1964), we have 

By Theorem 2 on p. 245 in Skorohod (1964) and Corollary 2 on p. 246 there, the 
density of Qa / with respect to Qjj is given by the conditional expectation 


( 7 ) 


=Ea,7 




(X) 


A,/ 


Xa = X 


where the subscript in the conditional expectation operator signifies the fact that 
it is evaluated under the probability R~ j- Hence the likelihood (in the parameter 

pair (A, /)) associated with the sample is given by the product 


n 

(8) L^(A,/) = n^A/(^f)- 

Z=1 

An advantage of specifying the likelihood in this manner is that it allows one to 
reduce some of the difficult computations for the laws to those for the laws 
R^j, which are simpler. 

Observe that the priors on A and / indirectly induce the prior H = Hi x n 2 
on the collection of densities k'^j. We will indiscriminately use the symbol H to 
signify both the prior on (A,/), but also on the density The posterior in the 
first case will be understood as the posterior for the pair (A, /), while in the second 
case as the posterior for the density k'^j. We will often use the same symbol H to 
denote the posterior distribution of (A,/) and on the density k^j. This simplifies 
notationally some of the formulations below. 

By Bayes’ theorem, the posterior measure of any measurable set A C (0, oo) x X 
is given by 

^ //^A^(A,/)dni(A)dn2(/) 

^ //LA(A,/)dni(A)dn2(/)- 


Upon setting A = {k\j : (fc. A) G A} and recalling our conventions above, this can 
also be written as 


n(A|z^) 
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Once the posterior is available, one can next proceed with computation of other 
quantities of interest in Bayesian statistics, such as Bayes point estimates or credible 
sets. 

2.2. Notation. Throughout the paper we will use the following notation to com¬ 
pare two sequences {a„} and {&«} of positive real numbers: a„ ^ bn will mean that 
there exists a constant C > 0 that is independent of n and is such that an < Cbm 
while a„ > bn will signify the fact that an > Cbn- 

Next we introduce various notions of distances between probability measures. 
The Bellinger distance h(QojQi) between two probability laws Qo and Qi on a 
measurable space (B, 5^) is defined as 



Assume further Qo Qi. The Kullback-Leibler (or informational) divergence 
K(Qo,Qi) is defined as 

K(Qo,Qi)= J log(|j^)dQo, 

while the V-discrepancy is dehned through 

V(Qo,Qi) = j log' (^)dQo. 

Here is some additional notation. For /, g nonnegative integrable functions, not 
necessarily densities, we write 

h‘^{f,9) = J (v7- V5)^ 

K(/, g) = j \og^ f - J f + J g 

yif,9)= [ log"-/. 

J 9 

Note that these ‘distances’ are all nonnegative and only zero if f = g a.e. If / and 
g are densities of probability measures Qo and Qi on (M, B) respectively, then the 
above ‘distances’ reduce to the previously introduced ones. 

We will also use K(a::, y) = xlog ^ — x + y for x,y > 0. Note that also K(x, y) > 0 
and K(a:, y) = 0 if and only if x = y. 

3. Main result on posterior contraction rate 

Denote the true parameter values for the compound Poisson process by (Ao, fo). 
Recall that the problem is to estimate fo and Aq based on the observations 
and that A —>■ 0 in a high frequency regime. To say that a pair (/, A) lies in a 
neighbourhood of (/o,Ao), one needs a notion of distance on the corresponding 
measures Q\j and Q^^ the two possible induced laws of Zf" = Xia — ^(i-i)a- 
The Hellinger distance is a popular and rather reasonable choice to that end in non- 
parametric Bayesian statistics. However, for A —>■ 0 the Hellinger metric h between 
those laws automatically tends to 0. The first assertion of Lemma 1 below states 
that /i(Q^j,Q^^ is of order '/A when A —0. This motivates to replace the 
ordinary Hellinger metric h with the scaled metric = hj^/A in our asymptotic 
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analysis for high frequency data. Of course, for fixed A (in which case one can take 
A = 1 w.l.o.g.), nothing changes with this replacement. The lemma also shows that 
the Kullback-Leibler divergence and the V-discrepancy are of order A for A —)■ 0. 
Therefore we will also use the scaled distances = K/A and = V/A 

Lemma 1. The following expressions hold true: 

(9) ^m^^h‘^{Qx,f,Qxo,fo) = = J (^^/(a;) - x/>^ofo{x)f dx, 

(10) hm =K(A/,Ao/o) =AK(/,/o)+K(A,Ao), 

(11) hm iv(Q^.;,Q^„.;J=V(A/,Ao/o) = J log^ Xf{x) dx. 

The proof will be presented in Appendix A.l. 

Remark 1. The Hellinger process (here deterministic) of order 1 for continuous 
observations of X on an interval [0, t] is given by (Jacod and Shiryaev 2003, Sections 
IV.3 and IV.4a) 

^ y ix/Xfix) - x/XQfo{x)f dx = hit, 

from which it follows that — 2 exp (—/it), whose derivative 

in i = 0 is the same as in (9) and thus equal to 2hi. For the Kullback-Leibler 
divergence and the discrepancy V similar assertions hold. These observations have 
the following heuristic explanation. For A —>■ 0, there is no big difference between 
observing the path of X over the interval [0, A] and Xa, as the probability of 
{Na > 2} is small (of order A^). 

In order to determine the posterior contraction rate in our problem, we now 
specify suitable neighbourhoods A„ of (Ao,/o), for which this will be done. Let 
M > 0 be a constant and let {Sn} be a sequence of positive numbers, such that 
0 as n —>■ 00 . Let 

h^{qo,Qi) = ^hiqo,Qi) 

be a rescaled Hellinger distance. Lemma 1 suggests that this is the right scaling to 
use. Introduce the complements of the Hellinger-type neighbourhoods of (Ao,/o), 

A(£„,m) = {(A,/): h^iqt,j„qtf) > Me„}. 

We shall say that e„ is a posterior contraction rate, if there exists a constant M > 0, 
such that 

(12) n(A(£„,M)|Z^)^0 

in -probability as n —)■ oo. Our goal in this section is to determine the ‘fastest’ 

rate at which £„ is allowed to tend to zero, while not violating (12). 

We will assume that the observations are generated from a compound Poisson 
process that satisfies the following assumption. 

Assumption 1. (i) Aq is in a compact set [A, A] C (0, c»); 
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(ii) The true density /o is a location mixture of normal densities, i.e. 

fo{x) = fHo,ao{x) = j (faoix - z)<1Hq{z) 

for some fixed distribution Hq and a constant (Tq G [g_,a] C (0, oo). Further¬ 
more, for some 0 < Kq < oo, F[q[—kq, kq] = 1, i.e. Hq has compact support. 

The more general location-scale mixtures of normal densities, 



possess even better approximation properties than the location mixtures of the 
normals (here Hq and Kq are distributions) and could also be considered in our 
setup. However, this would lead to additional technical complications, which could 
obscure essential contributions of our work. 

For obtaining posterior contraction rates we need to make some assumptions on 
the prior. 

Assumption 2. 

(i) The prior on X, Hi, has a density tti (with respect to the Lehesgue measure) 
that is supported on the finite interval [A, A] C (0,oo) and is such that 


(13) 


0 < < TTl (A) < TTl < OO, A G [A, A] 


for some constants tti and tfi ; 

(ii) The base measure a of the Dirichlet process prior Da has a continuous density 
on an interval [—/Cq — C,kq C], with kq as in Assumption 1 (ii), for some 
(> 0, is bounded away from zero there, and for all t > 0 satisfies the tail 
condition 


a{\z\ > t) < e 


(14) 


with some constants b > 0 and S > 0; 

(Hi) The prior on a, 03, is supported on the interval [ct, ct] C ( 0 ,oo) and is such 
that its density ttq with respect to the Lebesgue measure satisfies 


0 < Ha ^ ■^ 3 ( 0 ’) < 713 < 00 , tJ G [ct, a] 


for some constants ttq and ttq. 

Assumptions 1 and 2 parallel those given in Ghosal and van der Vaart (2001) 
in the context of non-parametric Bayesian density estimation using the Dirichlet 
location mixture of normal densities as a prior. We refer to that paper for an 
additional discussion. 

The following is our main result. Note that it covers both the case of high 
frequency observations (A —>■ 0) and observations with fixed intersampling intervals. 
We use n to denote the posterior on (A, /). 

Theorem 1. Under Assumptions 1 and 2, provided nA —>■ 00 , there exists a con¬ 
stant M > 0, such that for 



we have 


n(A(£„,M)|Z^) ^0 


in -probability as n —> 00 . 
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For fixed A (w.l.o.g. one may then assume A = 1) the posterior contraction rate 
in Theorem 1 reduces to e„ = ■ We also see that the posterior contraction 

rate is controlled by the parameter 6 of the tail behaviour in (14). Note that if (14) 
is satisfied for some <5 > 4, it is also automatically satisfied for all 0 < i5 < 4. The 
stronger the decay rate in (14), the better the contraction rate, but all <5 > 4 give 
the same value k = 1. The best possible posterior contraction rate in Theorem 1 
for minimal <5 is obtained for <5 = 4. In the proof in Section 5 we can therefore 
assume that 5 < A. 

As on p. 1239 in Ghosal and van der Vaart (2001) and similar Corollary 5.1 
there, Theorem 1 implies existence of a point estimate of (Aq, /o) with a frequentist 
convergence rate The (frequentist) minimax convergence rate for estimation of 
relative to the Hellinger distance is unknown in our problem, but an analogy 
to Ibragimov and Khas’minskii (1982) suggests that up to a logarithmic factor it 
should be of order VnA (cf. Ghosal and van der Vaart (2001), p. 1236). The 
logarithmic factor is insignificant for all practical purposes. The convergence rate 
of an estimator of the Levy density with loss measured in the L 2 -metric in a more 
general Levy model than the GPP model is (nA)“^/*^^^+^\ whenever the target 
density is Sobolev smooth of order /3 (cf. Comte and Genon-Catalot (2011)). Our 
contraction rate is hence, roughly speaking, a limiting case of the convergence in 
Comte and Genon-Catalot (2011) for (3 —)■ oo. 

4. Algorithms for drawing from the posterior 

In this section we discuss computational methods for drawing from the distribu¬ 
tion of the pair (A, /), conditional on (or equivalently: conditional on Z^). In 
the following there is no specific need that the observational times are equidistant. 
We will assume observations at times 0 < < • • • < and set Ai = ti — ti-i 
(1 < i < n). Further, for consistency with notation following shortly, we set 
Zi = Xt- — Xt._^ and z = (zi,..., z„). We will use “Bayesian notation” throughout 
and write p for a probability density of mass function and use tt similarly for a prior 
density or mass function. 

In general, it is infeasible to generate independent realisations of the posterior 
distribution of (A, /). To see this: from (4) one obtains that the conditional density 
of a nonzero increment z on a time interval of length A is given by 

„-AA °° /xaV 

(15) = 

k=l 

which generally is rather intractable due to the infinite weighted sum of convolu¬ 
tions. We specialise to the case where the jump size distribution is a mixture of 
J > 1 Gaussians. The richness and versatility of the class of finite normal mixtures 
is convincingly demonstrated in Marron and Wand (1992). 

Hence, we assume 

.7 .7 

( 16 ) f{-) = 

7=1 i=i 

where p,,a^) denotes the density of a random variable with A/’(/7, ct^) distribu¬ 
tion. Note that in (16) we parametrise the density with the precision t. In the 
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“simple” case J = 2 the convolution density of k independent jumps is given by 
y*fc(.) + {k - e)fi2-,k/T). 


Plugging this expression into equation (15) confirms the intractable form of p{z \ 

A,/). 

We will introduce auxiliary variables to circumvent the intractable form of the 
likelihood. In case the CPP is observed continuously, the problem is much easier 
as now the continuous time likelihood on an interval [0,T] is known to be (Shreve 
(2008), Theorem 11.6.7) 

i&V 

where the Ti are the jump times of the CPP, Ji the corresponding jump sizes and 
V = {i : Ti < T}. The tractability of the continuous time likelihood naturally 
suggests the construction of a data augmentation scheme. Denote the values of 
the CPP in between times ti-i and U by q. We will refer to q as the 
missing values on the z-th segment. Set 


a;™'* = 1 < z < n}. 


A data augmentation scheme now consists of augmenting auxiliary variables 
to (A,/) and constructing a Markov chain that has p(a;”®®®,A, / | z) as invariant 
distribution. More specifically, a standard implementation of this algorithm consists 
of the following steps: 

1. Initialise a:™®. 

2. Draw (A,/) | (x™",z). 

3. Draw a;"®®®* | (A,/,z). 

4. Repeat steps 2 and 3 many times. 

Under weak conditions, the iterates for (A, /) are (dependent) draws from the pos¬ 
terior distribution. Step 3 entails generating compound Poisson bridges. By the 
Markov property, bridges on different segments can be drawn independently. Data 
augmentation has been used in many Bayesian computational problems, see e.g. 
Tanner and Wong (1987). The outlined scheme can be applied to the problem at 
hand, but we explain shortly that imputation of complete CPP-bridges (which is 
nontrivial) is unnecessary and we can do with less imputation, thereby effectively 
reducing the state space of the Markov chain. 

As we assume that the jumps are drawn from a non-atomic distribution, impu¬ 
tation is only necessary on segments with nonzero increments. For this reason we 
let 


I = {z e {1,... ,rz} : Zi^O} 


denote the set of observations with nonzero jump sizes and define the number of 
segments with nonzero jumps to be / = \X\. 


4.1. Auxiliary variables. Note that if U ^ / with / as in (16), then Y can be 
simulated by first drawing its label L, which equals j with probability pj, and next 
drawing from the N{pL, l/r) distribution. Knowing the labels, sampling the jumps 
conditional on their sum being z is much easier compared to the case with unknown 
labels. Adding auxiliary variables as labels is a standard trick used for inference in 
mixture models (see e.g. Diebolt and Robert (1995), Richardsen and Green (1997). 
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For the problem at hand, we can do with even less imputation: all we need to know 
is the number of jumps of each type on every segment with nonzero jump size. For 
i € I and j € J}, let riij denote the number of jumps of type j on segment 

i. Denote the set of all auxiliary variables by a = {oj, i € /}, where 

i^il 5 ^i2 7 ■ ■ ■ J ^ij ) ■ 

In the following we will use the following additional notation: for i = 
j = 1,..., J we set 

J 

s = H 

i=i 

These are the number of jumps on the Fth segment, the total number of jumps 
of type j (summed over all segments) and the total number of jumps of all types 
respectively. 


rii = 


J 

E 

1=1 


tii 


Si = 


i=l 


4.2. Reparametrisation and prior specification. Instead of parametrising with 
(A,pi,.. ■,pj), we define 




Then 


X = ip ^ 


i=i 


J’ 


Pj = 


iPj 


J2j=i 

The background of this reparametrisation is the obervation that a compound Pois¬ 
son random variable Z whose jumps are of J types can be decomposed as Z = 
^j:^i Zj, where the Zj are independent, compound Poisson random variables whose 
jumps are of type j only, and where the parameter of the Poisson random variable is 
xpj. In what follows we use 0 = (ip, p, r) with ip = {ipi ,..., ipj) and /i = (/xi,..., /xj). 

Denote the Gamma distribution with shape parameter a and rate /? by G(ct, /3). 
We take priors 


fj, I r 


iid 


G{ao, /3o) 

■^{[^1, ■ ■ ■ ,ij]',I jxj{tk)~'^) 

G{ai,l3i) 


with positive hyperparameters {ao, Po,C(i, Pi, k) fixed. 


4.3. Hierarchical model and data augmentation scheme. We construct a 
Metropolis-Hastings algorithm to draw from 


p(6»,a I z) 


P{0,z,a.) 

p{z) 


For an index i £ I we set a_i = {aj, j £ T \ {i}}. The two main steps of the 
algorithm are: 

(i) Update segments: for each segment i £l, draw Oi conditional on [9, z, a_i); 

(ii) Update parameters: draw 6 conditional on ( 2 ;, a). 

Compared to the full data augmentation scheme discussed previously, the present 
approach is computationally much cheaper as the amount of imputation scales with 
the number of segments that need imputation. If the time in between observations 
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is fixed and equal to A, then the expected number of segments for imputation 
equals n (l — which is for small A approximately proportional to nAA. 

Denote the Poisson distribution with mean A by ’P(A). Including the auxiliary 
variables, we can write the observation model as a hierarchical model 

z^\ai,^,T N{aifi,ni/T) 

(17) riij I i/; V{ipjA^) 


(with i G {1,... ,n} and j G {1,..., J}). This implies 


p(0,z,a) = 7r(0) X 

2 = 1 


(j){zi;aln,ni/T)Yle 

1=1 


rzy! 1 ■ 


4.4. Updating segments. Updating the i-th segment requires drawing from 


,7 

p(oj I 0,z,a_i) cx (l){zi]a'iP,n^/T) 

i=i 


riijl 


We do this with a Metropolis-Hastings step. First we draw a proposal n° (for rii) 
from a V{XAi) distribution, conditioned to have nonzero outcome. Next, we draw 


a° = «i,... ,n°j) - A4A/'(n°;'(/'i/A, ...,7/>j/A), 

where A4Af denotes the multinomial distribution. Hence the proposal density 
equals 


o-AA, 


, o o , ^ (AA,)' 

\ 9 ) = 


n, 


n(V'i/A)^ 


o-AA. 


_ TT i^’jAiY 

l_e-AA. n 


The acceptance probability for the proposal n° equals 1 A H, with 


A = 


Y(zi;(aiyM,n°/T) 

(t){zi]a[fj.,n,/T) 


4.5. Updating parameters. The proof of the following lemma is given in Appen¬ 
dix A.3. 


Lemma 2. Conditional on a, ipi, ■ ■ ■ Yj o,re independent and 

I a ~ ^(ao + Sj,/3o + T). 

Furthermore, 

/7 I r, z, a ~ Af , 

r I z, a ~ g{ai + 1/2,13i + {R- q'P~^q)/2)), 
where P is the symmetric J x J matrix with elements 
(19) P = kIj^j + P Pj,k ='^n~^nijn^k, j,k G {I,... , J}, 

iGl 


(18) 
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q is the J-dimensional vector with 

( 20 ) Qj — ^ nijZij 

iex 

R > 0 is given by 

j 

(21) i? = 

j=l iGX 

and R — q'P~^q > 0. 

Remark 2. If for some j G {1,..., J} we have Sj = 0 (no jumps of type j), then 
the matrix P is singular. However, adding kIjxj ensures invertibility of P. 

4.6. Numerical illustrations. The first two examples concern mixtures of two 
normal distributions We simulated n = 5.000 segments with A = 1, /ii = 2, 
/i 2 = — 1 and T = 1. For the prior-hyperparameters we took oq = /3o = = /3i = 1, 

= ^2 = 0 and K = 1. 

The results for AA = 1, pi — 0.8, p 2 = 0.2 and hence '0i = 0.8 and ip 2 = 0.2 
are shown in Figure 1. The densities obtained from the posterior mean of the 
parameter estimates and the true density are shown in Figure 2. The average 
acceptance probability for updating the segments was 51%. 

The results for AA = 3, pi = 0.8, p 2 = 0.2 and hence ipi = 2.4 and ip 2 = 0.6 are 
shown in Figure 3. The densities obtained from the posterior mean of the parameter 
estimates and the true density are shown in Figure 4. The average acceptance 
probability for updating the segments was 41%. Observe that the autocorrelation 
functions of the iterations of the ipi in the second case display a much slower decay. 

We also assessed the performance of our method on a more complicated ex¬ 
ample where we took a mixture of four normals. Here A = 1, {pi, ^2, (J-St h-i) — 
(—1, 0,0.8, 2), = (0.3,0.4, 0.2,0.1) (hence A = 1) and = 0.09. 

The results obtained after simulating n = 10.000 segments are shown in Figures 5 
and 6. 

Mixtures of normals need not be multimodal and can also yield skew densities. 
As an example, we consider the case where (^ 1 ,^ 2 ) = (0,2), {'ijjx,'ij} 2 ) = (1.5, 0.5) 
(hence A = 2) and r = 1. Data were generated and discretely sampled with A = 1 
and n = 5.000 segments. A plot of the posterior mean is shown in Figure 7. 


4.7. Discussion. As can be seen from the autocorrelation plots, mixing of the 
chain deteriorates when AA increases. As the focus in this article is on high fre¬ 
quency data, where there are on average only a few jumps in between observations, 
we do not go into details on improving the algorithm. We remark that a non- 
centred parametrisation (see for instance Papaspiliopoulos et al. (2007)) may give 
more satisfactory results when AA is large. A non centred parametrisation can be 
obtained by changing the hierarchical model in (17). Denote by the inverse 
cumulative distribution function of the P{\) distribution. Let Uij (i = l,...,n 
and j = 1,... J) be a sequence of independent ?7(0,1) random variables and set 
u = {uij, i = 1,..., n, j = 1,... J}. By considering the hierarchical model 


E 






J 

-^E 




-1 
■03 A i 


(Uij) 


Zi I M, /i, r 


ind 


N 


A NON-PARAMETRIC BAYESIAN APPROACH TO DECOMPOUNDING 


15 


0 . 5 - 


0 . 4 - 




“■ 0 . 3 - 


ii.LMtiiiU.illillli.AiJU. I, 

^ WyP" 


5000 10000 15000 

iteration 


0.6- 

0 . 5 - 

0 5000 10000 15000 

iteration 








0 5000 10000 15000 


iteration 


iMili UUkiji AliiiAiAi iJli 











0 5000 10000 15000 

iteration 


’I'HF ''’T' f I*'' 


0 5000 10000 15000 

iteration 




Figure 1. Results for A = 1 using 15.000 MCMC iterations. The 
trace plots show all iterations; in the other plots the first 5.000 
iterations are treated as burnin. The figures are obtained after 
subsampling the iterates, where only each 5th iterate was saved. 
The horizontal yellow lines are obtained from computing the pos¬ 
terior mean of 0 based on the true auxiliary variables on all seg¬ 
ments. 


(22) u,, ~ [/(0,1) 

(is {1,..., n} and j S {1,...,J}), ip can be updated using a Metropolis-Hastings 
step. In this way {riij} and ip are updated simultaneously. 

Another option is to integrate out (^, t) from p(d, z, a). In this model it is even 
possible to integrate out ip as well. In that case only the auxiliary variables a have 
to be updated. Yet another method to improve the efficiency of the algorithm is to 
use ideas from parallel tempering (cf. Chapter 11 in Brooks et al. (2011)). 
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curve — posterior mean - > true 



Figure 2. Results for A = 1; the first 5.000 iterations are treated 
as burnin. Shown are the true jump size density and the density 
obtained from the posterior mean of the non-burnin iterates. 


5. Proof of Theorem 1 


There are a number of general results in Bayesian nonparametric statistics, such 
as the fundamental Theorem 2.1 in Ghosal et al. (2000) and Theorem 2.1 in Ghosal 
and van der Vaart (2001), which allow determination of the posterior contraction 
rates through checking certain conditions, but none of these results is easily and 
directly applicable in our case. The principle bottleneck is that a main assumption 
underlying these theorems is sampling from a fixed distribution, whereas in our 
high frequency setting, the distributions vary with A. Therefore, for the clarity 
of exposition in the proof of our main theorem we will choose an alternative path, 
which consists in mimicking the main steps of the proof of Theorem 2.1, involving 
judiciously chosen statistical tests, as in Ghosal et al. (2000), while also employing 
some results on the Dirichlet location mixtures of normal densities from Ghosal and 
van der Vaart (2001). However, a significant part of technicalities we will encounter 
are characteristic of the decompounding problem only. 

Throughout this section we assume that Assumptions 1 and 2 hold. Furthermore, 
in view of the discussion that followed Theorem 1 we will without loss of generality 
assume that 0 < <5 < 4. All the technical lemmas used in this section are collected 
in the appendices. 

We start with the decomposition 
(23) 

n(A(£„, M)\Z^) = n(A(e„, M)\Z^)^r^ + n(A(£„, M)\Z^){1 - 4>r.) =: 1„ + IR, 

where 0 < < 1 is a sequence of tests based on observations Z^ and with 

properties to be specified below. The idea is to show that the terms on the right- 
hand side of the above display separately converge to zero in probability. The tests 
(pn allow one to control the behaviour of the likelihood ratio 




n 




A NON-PARAMETRIC BAYESIAN APPROACH TO DECOMPOUNDING 


17 



0.4-. 

0 5000 10000 15000 20000 25000 

iteration 



0 5000 10000 15000 20000 25000 

iteration 


-ll 








\ 

iMii. 


uUu JllM. 


T fill 

'1' 

1 


0 5000 10000 15000 20000 25000 

iteration 



iteration iteration mul 



Figure 3. Results for A = 3 using 25.000 MCMC iterations. The 
trace plots show all iterations; in the other plots the first 10.000 
iterations are treated as burnin. The figures are obtained after sub- 
sampling the iterates, where only each 5th iterate was saved. The 
horizontal yellow lines are obtained from computing the posterior 
mean of 9 based on the true auxiliary variables on all segments. 


on the set where it is not well-behaved due to the fact that (A, /) is ‘far away’ from 
(Ao, /o)- 

5.1. Construction of tests. The next lemma is an adaptation of Theorem 7.1 
from Ghosal et al. (2000) to decompounding. A proof is given in Appendix A.2. 
We use the notation D{e,A,d) to denote the e-packing number of a set A in a 
metric space with metric d, applied in our case with d the scaled Hellinger metric 
h^. 

Lemma 3. Let Q be an arbitrary set of probability measures Q\f- Suppose for 
some non-increasing function D{e), some sequence {e„} of positive numbers and 
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curve — posterior mean - > true 



Figure 4. Results for A = 3; the first 10.000 iterations are treated 
as burnin. Shown are the true jump size density and the density 
obtained from the posterior mean of the non-burnin iterates. 



6 25000 50000 75000 lOo'oOQ 0 25000 50000 75000 lOo'oOO 6 25000 50000 75000 lOo'oOO 

iteration iteration iteration 





Figure 5. Results for the example with a mixture of four normals 
using 100.000 MCMC iterations. The trace plots show all itera¬ 
tions; in the autocorrelation plot the first 20.000 iterations are 
treated as burnin. The figures are obtained after subsampling the 
iterates, where only each 5th iterate was saved. The horizontal yel¬ 
low lines indicate true values. The results for the other parameters 
are similar and therefore not displayed. 





























A NON-PARAMETRIC BAYESIAN APPROACH TO DECOMPOUNDING 


19 


curve ^ posterior mean ■ i true 



Figure 6. Results for the example with a mixture of four normals; 
the first 20.000 iterations are treated as burnin. Shown are the true 
jump size density and the density obtained from the posterior mean 
of the non-burnin iterates. 


curve ^ posterior mean ■ ' true 



Figure 7. Results for the example with a skew density; the first 
20.000 iterations are treated as burnin. Shown are the true jump 
size density and the density obtained from the posterior mean of 
the non-burnin iterates. 


every £ > e„, 


(24) D G Q : e < <2e},h^) < Die) 
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Then for every e > e„ there exists a sequence of tests {4>n} (depending on e > 0), 
such that 

< C(A)exp (-K„AP) 

sup ®A,/[1 — </>n] < exp (^—KnAe^) , 

where K > 0 is a universal constant. 

In the proofs of Propositions 1 and 2 we need the inequalities below. There exists 
a constant C S (0,oo) depending on A and A only, such that for all Ai, A 2 S [A, A] 
and fi , /2 it holds that 

(25) < ^A(K(Py,,P;J + |Ai - A2n, 

(26) ViQtfw^th) < CA(V(Py,,P/J +K(P/,,P/J + |Ai - A2r), 

(27) /i(Qt./oC./J < Gv^dAi - A 2 I + h(P;,,PyJ). 

These inequalities can be proven in the same way as Lemma 1 in Gugushvili et al. 
(2015). 

Let e„ be as in Theorem 1. Throughout, C denotes the above constant. For a 
constant L > 0 define the sequences {o„} and {? 7 „} by 


an = L log^^^ ( — 


4C’ 


We will show that inequality (24) holds true for every e = Msn with M > 2 and 
the set of measures Q equal to 

Qn = {QxJh.c ■ ^ ^ [A, A],i7[-a„,a„] > 1 - 77„,cr e [ct,ct]}. 

As a Hrst step, note that we have 


(28) logD < logL> (e„, Q„,h^) 


< 


logA^ (y, Qn,h^'^ = log A Qn,h 


where N Q„, h'j is the covering number of the set Qn with h-balls of size 

£„-\/A/2. The first inequality in (28) follows from assuming M > 2. For bounding 
the righthand side in (28), we have the following proposition. 


Proposition 1. IFe have 


logN 


iVa 


,Qn,h^ < 


(29) 

Proof. Define 

Ari = {/h,<t : H[-an,an] > 1 - ?7n,cr G [ct,ct]}. 

Let {Ai} be centres of the balls from a minimal covering of [A, A] with | • |-balls of 
size rin. Let {fj} be centres of the balls from a minimal covering of Pn with /i-balls 
of size 77„. For any Qa,/„,„ G Q„, by (27) we have 

^ , EnVA 

.. .Ox. < — -. 


2 
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by appropriate choices of i and j. It follows that 
''snVA '' 


log TV 
Evidently, 


, Qn , h ] < log [A, A], I • I) + ^ogN{r]n,J^n, h ). 


\ogN{r]n, [A, A], I • I) log ( — 


As we assume 5 < 4, we can apply the arguments on pp. 1251-1252 in Ghosal and 
van der Vaart (2001), see in particular formulae (5.8)-(5.10) (cf. also Theorem 3.1 
and Lemma A.3 there), which yield 


logiV(77„,J-„,/i)<log"/^+i 


Combination of the above three inequalities implies the statement of the proposi¬ 
tion. □ 

An application of Proposition 1 to (28) gives 

log^* Qn,h^'^ < log^/'^+^ < cinA4, 

for some positive constant Ci. Here, the final inequality follows from our choice for 
En- Hence, (24) is satisfied for 

D{e) = exp((ci/M^ — K)nAe^). 

By Lemma 3 there exist tests (fin such that for all n large enough 

(30) Eao,/o [(fn] < 2 exp - ci)nAel) , 

(31) sup ®A,/[1 — 4’n] < exp (—ATnAM^e^) . 

5.2. Bound on I„ in (23). First note that by equation (30) 

EAo./oPn] < Eao./o [</'"] < 2 exp {-{KM^ - ci)nAel). 

Chebyshev’s inequality implies that I„ converges to zero in -probability as 

n —>■ oo, as soon as M is chosen so large that KM^ — ci > 0. □ 

5.3. Bound on H„. Now we consider H„. We have 

,, _ //a(.„,M) (A, /)dni(A)dn2(/)(l - (fn) ^ HI„ 

i-l-n — 


///:A(A,/)dni(A)dn2(/) 


iv„ 


We will show that the numerator HI„ goes exponentially fast to zero, in 
probability, while the denominator IV„ is bounded from below by an exponential 
function, with -probability tending to one, in such a way that the ratio of 

HI„ and IV„ still goes to zero in -probability. 

Bounding HI„. As l{A(e„.M)} < 1 q= + l{A(e„.M)nC„} we have 

EAo./o[ni„] < n(Q0) + ff Eaj[i - <^„]dni(A)dn2(/). 

J dQ„nA{e^,M) 
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Here we applied Fubini’s theorem to obtain the second term on the right-hand-side, 
which by (31) is bounded by exp(—Furthermore, 

n(Q))) = n2(i/[-a„,a„] < 1 - ?7„,cr G [a, a]) < — 

Vn 

where the last inequality is formula (5.11) in Ghosal and van der Vaart (2001). 
Hence 

(32) EacJo[HI„] < —+ exp{-KM^nAel). 

'Hn 


Bounding IV„. Recall Ka = K/A and Va = V/A. Let 


H^(£,(Ao,/o)) = {(A,/):iG^ 


^Ao./o>' 


?A./ 




and 

^ _ log(nA) 

^ • 

y/nA 

Note that nAe^ —>■ oo when n —>■ oo. 

We will use the following bound, an adaptation of Lemma 8.1 in Ghosal et al. 
(2000) to our setting, valid for every e > 0 and C > 0, 


(33) ^^^£„(A,/)dn(A,/)<exp(-(l + C)nAe2)) < 

where 

n(.) =_EH_ 

n(HA(£^(Ao,/o))) 

is a normalised restriction of n(-) to B^{e, (Aq, /o))- 

By virtue of (33), with -probability tending to one, for any constant C > 0 
we have 


(34) 


IV„ > 


B*(£„,(Ao,/o)) 


C^{X,f)dU,{X)xdU2{f) 


>n{B (£„, (Ao,/o))) exp(-(l-f C)nAe„). 


We will now work out the product probability on the right-hand side of this in¬ 
equality. 


Proposition 2. It holds that 

U{B^{en,Qxojo)) ^exp (^-clog^ (^)) 
for some constant c. 

Proof. Let 0 < c < Ijx/liC be a constant. Here C is the constant in (25) and (26). 
By these inequalities it is readily seen that 

{(A,/) : X(P/„,P/) < c2?2„,F(Py„,P/) < |Ao - A|2 < C B^fe^Mtfo)- 

It then follows by the independence assumption on Hi and n 2 that 
n{B^{en,QtoJo)) ^ ni (|Ao - A| < ce„) 

X n2 (/ : K(P^„,P;) < c2?"„,V(Py„,P/) < . 
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For the first factor on the right-hand side we have by (13) that 

ni (|Ao - A| < cej > 

As far as the second factor is concerned, for some constants ci, C 2 it is bounded 
from below by 

Cl exp ^-C 2 log^ 

by the same arguments as in inequality (5.17) in Ghosal and van der Vaart (2001). 
The result now follows by combining the two lower bounds. □ 



Combining (34) with Proposition 2, with -probability tending to one as 

n —>■ 00 , for any constant C > 0 we have 

(35) IV„ > exp ^-(1 -h C)nA^ - clog^ ^ . 

We are now ready for showing the final steps of proving that II„ tends to zero in 
"Probability. Let G„ denote the set on which Inequality (35) is true. Then 
by (32) we obtain 


EAo,/o[IInlG„] < exp 


^(1 -f C)nA^ + clog^ 



X 


— -f exp(-A:M2nAe2) 

Vn 


Recall that nAe^ = log^(nA). Hence, the exponent in the first factor of this display 
is of order log^(nA). Furthermore log^(4G/£„), which is of order log^(nA) 

as well. It follows that, provided the constants L and M are chosen large enough, 
the right-hand side of the above display converges to zero as n —)■ oo. Chebyshev’s 
inequality then implies that II„ converges to zero in probability as n —>■ oo. This 
completes the proof of Theorem 1. □ 


Acknowledgement: We wish to thank Wikash Sewlal from Delft University of 
Technology for the simulation results of the example with a mixture of four normals 
and the skewed density. 


Appendix A. Additional lemmas and proofs 


A.I. Proof of Lemma 1. We give a detailed proof of Equality (9). As we are 
interested in small values of A, we make some necessary approximations. Starting 
point is the expansion for the ‘density’ of Qxf with respect to the Lebesgue measure, 

OO 

e“^^(5o(a:) -f (1 - e“^^) ^ Om(AA)/*'"(x), 

m—1 


see (4), with coefficients am defined in (5). It follows that we have the likelihood 
ratio 


^^oJo 


= g-(A-Ao)A 


La ;=0 


(1 - e-^oA) am{XoA)fr{^) 

Xf{x) 




Xofoix) 


■o(A) 
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where we collected terms of order A™ for m > 2 as o(A). Hence we get for the 
Hellinger affinity 


A/:Qao,/o) 




the approximating expression 

^(Qa,/,Qao./o) = ^ /o) + 0(A)) . 

It follows that for A —>■ 0, 

h^iQtf,Qtfo) = 2 - 2i^(Qt/,Qt./o) 

= 2 - (l + Ay/^Hif, fo) + o(A)^ 


= 2(1 - (^A^/^H{f, fo) + o(A)) . 


Hence, for A —>■ 0, 

1 


2/^ A 


— h ^VA,/>VAo,/o 


) —>■ A + Aq — 2 -\/ XoXH{f, fo) 

= J W^f{x) - \oh[x))‘^ dx. 


Equality (9) follows. The proofs of the equalities (10) and (11) follow a similar line 
of reasoning. 

A.2. Proof of Lemma 3. The proof is an adaptation of Theorem 7.1 from Ghosal 
et al. (2000) to decompounding. In all what follows it is assumed that ^ G Q, 
but we suppress this assumption in the notation. Observe that 


D 




= D 


sa/A 


, : eVA < 


t/o^Qt/)<2^v^},M- 


From this point on the arguments from the proof of Theorem 7.1 in Ghosal et al. 
(2000) are applicable (with e replaced by eVA) and eventually lead to the desired 
result. The role of formulae (7.1)-(7.2) in that proof are played in the present 
context by (36) and (37) below. 

For a given (Ai, /i) there exists a sequence of tests based on Z^, such that 


(36) 

(37) 


sup 


EAo./o[<?i’n] < exp , 

Ea,/[1 - (t)n] < exp (^-^nAh^{Qxojo,Q\j) 


These two inequalities simply follow by rewriting the inequalities 

1 


'2,^ A 


iiA 


EAo,/o['('n] < exp ( -^nh vvaoJcd va,/; ; > 


sup 




Ea,/[ 1 - (()„] < exp ( - l-nh 








A NON-PARAMETRIC BAYESIAN APPROACH TO DECOMPOUNDING 


25 


which are proved on pp. 520-521 in Ghosal et al. (2000) and rely upon the results 
in Birge (1984) and Le Cam (1986). 

A.3. Proof of Lemma 2. As the priors for ipi,... ,ijjj are independent, we obtain 
that 

,7 

MV' I ^^,T,z,a) =MV’ I a) oc (e"’^"^V'f’’■(V'M) 

7 = 1 

7 = 1 

which proves the first statement of the lemma. 

For (/7, r) we get 

p{fj.,T I 2;,a) oc 

iex 

X r“'-M-^i^r“'/2exp y ^(^7 - ^ 7 )^ j ■ 

This is proportional to 

where 

j 

D{p) = • 

7 = 1 iel 

From this expression it is easily seen that we can integrate out p to obtain the 
distribution of r, conditional on {z, a). To get this right, write D{p) as a quadratic 
form of fi: 

D{ii) = fj! Pp — 2q' p + R. 

By completing the square, we find that 

exp ^—-prPp + rq'pj dp. 

The integrand is (up to a proportionality constant), the density of a bivariate normal 
random vector with mean vector P~^q and covariance matrix t~^P~^ evaluated in 
p. This implies that the preceding display equals 

We conclude that 

pir I z, a) cx + ^(^ - q'P-\))r^ , 

which proves the asserted Gamma distribution of r. This computation also imme¬ 
diately leads to the assertion on the distribution of p. We finally show that the 
rate parameter appearing for r is positive. By definition D{p) > 0 for all p. This 
implies that D{P~^q) = q'P~^q — 2q'P~^q + R = R — q'P~^q > 0. 
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